Intro

Background Information: We were tasked by Kirk Bogard, the Associate Vice President for Development and External Relations at Miami University to explore a dataset of real student data in order to find relationships and patterns that he can use to give Miami a competitive advantage. After exploring the data, we found a few particular variables that can help us find a potential relationship in the dataset, survey_salary, survery_internships, and survey_state. We plan to build a regression model using the number of internships during college to predict salary after graduation, using state to control for salary. The purpose of this analysis is to provide information on the relationship between number of internships and salary information to FSB Career Services. This will help them give more accurate guidance to students to ensure they get the best full time opportunity for them.

Survey Overview: Usable Response %: Our initial dataset included information on much more than just internship and salary information, and as such there were observations that provided no meaningful information to our model. After removing these observations (e.g. those without a reported salary after graduation, unreported number of internships, unreported starting location), the dataset shrunk to a bit over 50% of the original size. Out of 3235 original observations, we were left with little over 1700 to work with.

Distribution of Number of Internships: To understand the data a bit better, we wanted to explore how many internships students were completing before going into the job force. According to this basic histogram, most students completed 1 or 2 internships, followed by 3, while 0,4, and 5 had fairly few instances. The distribution looks relatively normal, but seems to be skewed slightly right.

Internship Effects on Salary: Mean Salary by Number of Internships: This is a bar chart depicting the average salary after graduation grouped by the number of internships they completed. What we see from this graph is that salary increases a bit for each additional internship completed up to 3 internships, after which average salary levels out. We expected salary to generally increase after an additional internship is completed, and the chart seems to support this. However, it is interesting to see that having additional internships after completing 3 does not seem to have an effect on salary, even lowering expected salary with 5 internships (potentially due to there being so few observations with 5 internships). This could indicate you do not need more than 3 internships if you want to maximize salary after graduation.

Regression Model Predicting Salary by Number of Internships:

Using a regression model we predicted salary by number of internships using 0 internships as a baseline.

There is a sizable increase in salary when adding another internship up until 3 internships, 4 internships is about the same as 3 internships and 5 internships even drops the predicted salary.

Survey Overview

Overview of survey responses

row

Usable Response %

Usable Responses

1726

row

Distribution of Number of Internships

Internship Effects on Salary

Column

Mean Salary by Number of Internships

Column

Regression Model Predicting Salary by Number of Internships


Call:
lm(formula = df$survey_salary ~ df$survey_internships)

Residuals:
   Min     1Q Median     3Q    Max 
-51384  -6755    327   5745 115713 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             51591.1      974.2  52.956  < 2e-16 ***
df$survey_internships1   5163.6     1089.6   4.739 2.32e-06 ***
df$survey_internships2   7696.0     1069.1   7.199 9.05e-13 ***
df$survey_internships3   9793.1     1223.7   8.003 2.21e-15 ***
df$survey_internships4   9929.2     2235.3   4.442 9.48e-06 ***
df$survey_internships5   9308.9     4260.5   2.185    0.029 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11730 on 1720 degrees of freedom
Multiple R-squared:  0.04577,   Adjusted R-squared:  0.04299 
F-statistic:  16.5 on 5 and 1720 DF,  p-value: 6.139e-16